Genetic Fuzzy Discretization for Classification Problems

نویسندگان

  • Yoon-Seok Choi
  • Byung Ro Moon
چکیده

Many real-world classification algorithms can not be applied unless the continuous attributes are discretized and the interval discretization methods are used in many machine learning techniques. It is hard to determine the intervals for the discretization of numerical attributes that has an infinite number of candidates. And interval discretization methods are based on a crisp set, a value in a continuous attribute must belong to only one interval. They are often not proper for describing a value located around the boundaries of intervals. Fuzzy partioning is an attractive method for those cases in classification problems. An important decision in fuzzy partitioning is about the positions of interval boundaries and the degrees of overlapping in the fuzzy sets. We optimize the parameters that specify fuzzy partitioning by genetic algorithms. We divide the range of a continuous attribute into k intervals and represent each value by a k-bit string where each bit corresponds to one interval. The i bit of the binary string represents whether the value belongs to the i interval or not. While a value belongs to only one interval in a simple discretization, it can belong to more than one interval in fuzzy discretization. Thus a value can be represented by a binary mask. For example, in Fig. 1, the value 0.595 belongs to the third and fourth intervals and is represented by a binary mask 00110. It provides more flexibility in machine learning algorithms for pattern classification. We optimize the boundaries of intervals and the degrees of overlapping in fuzzy discretization. We use four parameters for each interval Ii: ti, ti+1, li and ui. The genetic fuzzy membership function is defined as follows:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection in Genetic Fuzzy Discretization for the Pattern Classification Problems

We propose a new genetic fuzzy discretization method with feature selection for the pattern classification problems. Traditional discretization methods categorize a continuous attribute into a number of bins. Because they are made on crisp discretization, there exists considerable information loss. Fuzzy discretization allows overlapping intervals and reflects linguistic classification. However...

متن کامل

ارائه‌روش جدید مبتنی‌بر برنامه‌نویسی ژنتیک برای وزن‌دهی قوانین فازی در طبقه‌بندی نامتوازن

In classification problems, we often encounter datasets with different percentage of patterns (i.e. classes with a high pattern percentage and classes with a low pattern percentage). These problems are called “classification Problems with imbalanced data-sets”. Fuzzy rule based classification systems are the most popular fuzzy modeling systems used in pattern classification problems. Rule weights...

متن کامل

FRULER: Fuzzy Rule Learning through Evolution for Regression

In regression problems, the use of TSK fuzzy systems is widely extended due to the precision of the obtained models. Moreover, the use of simple linear TSK models is a good choice in many real problems due to the easy understanding of the relationship between the output and input variables. In this paper we present FRULER, a new genetic fuzzy system for automatically learning accurate and simpl...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

OFP_CLASS: a hybrid method to generate optimized fuzzy partitions for classification

The discretization of values plays a critical role in data mining and knowledge discovery. The representation of information through intervals is more concise and easier to understand at certain levels of knowledge than the representation by mean continuous values. In this paper, we propose a method for discretizing continuous attributes by means of a series of fuzzy sets, which constitutes a f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004